A Stochastic Articulatory-to-acoustic Mapping as a Basis for Speech Recognition

نویسندگان

  • Patrick F. Valdez
  • John Hogden
  • Patrick Valdez
چکیده

Hidden Markov models (HMMs) of speech acoustics are the current state-of-the-art in speech recognition, but these models bear little resemblance to the processes underlying speech production (Lee, 1989). In this respect, using an HMM to model speech acoustics is like using a Gaussian distribution to model data generated by a Poisson process – to the extent that the model is not an accurate representation of generating process, the accuracy of the model, and the meaning of the inferred parameters, is limited. Of This model mismatch likely contributes to the fact that state-ofthe-art recognition performance (word accuracy) on recorded telephone conversations is only around 60-65%. There have been recent attempts to create stochastic models of speech acoustics make more realistic assumptions about the mechanisms underlying speech production (Bakis, 1991; Deng, 1998; Hogden, 1998; Picone et al., 1999). In this paper we describe two stochastic models of speech “production Conditional Observable Maximum Likelihood Mapping (CO-MALCOM) and its predecessor, Maximum Likelihood Continuity Mapping (MALCOM). The main component of both of these models is a stochastic mapping between speech acoustics and speech articulation. A counter-intuitive aspect of the stochastic mapping is that the parameters of the mapping can be found using only acoustic data. While most speech researchers are familiar with the fact that HMM parameters can be estimated from acoustics alone, many still find it surprising that the mapping between speech acoustics and speech articulator positions (positions of the tongue, jaw, lips . ..) can be found without articulator position measurements. Nonetheless, there are theoretical and experimental reasons to believe that MALCOM and it’s allies learn a stochastic mapping between articulator positions and speech acoustics. Furthermore, CO-MALCOM can be combined with standard speech recognition algorithms to form a speech recognition approach based on a production model. Results of experiments related to MALCOM are summarized, and the COMALCOM extension is described. BACKGROUND: STATE-OF-THE-ART SPEECH RECOGNITION In many realistic domains, automatic speech recognition performance is inadequate. To be concrete, at the National Institute of Standards and Technology 1998 HUB-5 Speech Recognition Evaluation, state-of-the-art systems had about a 6070-65% word recognition rate on “------’ casual

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pseudo-Articulatory Representations and the Use of Syllable Structure for Speech Recognition

The alternative approach for speech recognition proposed here is based on pseudo-articulatory representations (PARs), which can be described as approximation of distinctive features, and aims to establish a mapping between them and their acoustic specifications. This mapping which is used as the basis for recognition is first done for vowels. It is obtained using multiple regression analysis af...

متن کامل

Acoustic-to-articulatory Inversion of Speech: a Review

In this article, we review a specific speech processing research area called acoustic-to-articulatory inversion of speech, or simply speech inversion, which has attracted many researchers and scientists during the last 35 years. The underlying problem refers to the mapping from the acoustic space, which is well-defined since it consists of acoustic signals, to the articulatory space. The latter...

متن کامل

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory and acoustic space increase complexity of speech-to-articulatory mapping, which is already an ill-posed problem due to its inherent no...

متن کامل

A New Bidirectional Neural Network Model for the Acoustic- Articulatory Inversion Mapping For Speech Recognition

In this paper, a new bidirectional neural network for better acoustic-articulatory inversion mapping is proposed. The model is motivated by the parallel structure of human brain, processing information by having forward-inverse connections. In other words, there would be a feedback from articulatory system to the acoustic signals emitted from that organ. Inspired by this mechanism, a new bidire...

متن کامل

A Rough Guide to the Acoustic-to-articulatory Inversion of Speech

| This article reviews a speci c speech research area called acoustic-to-articulatory inversion of speech, or speech inversion, which refers to the problem of mapping the acoustic speech signal onto a space describing the conguration of the human vocal tract that actually produced this signal. This space may be modeled in a variety of ways, such as with trajectories of the movement of the artic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000